Encoding method adapted to the recognition of synchronization works in sequences of variable length encoded words

ABSTRACT

The encoding method is adapted to the recognition of synchronization words in sequence of variable length encoded words, each encoded word corresponding to a specific message or event from a sequences of messages each assigned with a probability of occurrence. The method consists in constructing a dichotomizing encoding tree comprising a specific number of leaves (F1 . . . F4) distributed over priority levels (n 1  . . . n 4 ) classified in a decreasing order on processing from the root of the tree towards the leaves, the number of leaves being equal to the number of events to be encoded, in placing the leaves adjacent to one another on each level, the leaves corresponding to the most probable events being placed at one end to the right or to the left on each level, and in prohibiting, in the encoded words, the sequences of Smax bits of equal value, zero or one, in order to obtain an easy choice of synchronization words.

The present invention relates to a method for recognition ofsynchronization words in sequences of variable length encoded words. Theinvention is applied, in particular, in the construction of flow-ratecompression systems.

In order to flow-rate limit information transmitted between transmitterand digital television receiver, appeal is often made to the statisticalencoding method of Huffman. However, this method involves theconstruction of variable length code words whose maximum length, whichis not controllable, leads to complicated constructions of encodingdevices. On the other hand, as the codes have non-uniform lengths, thebinary words which they form may be distorted; which thus provokestransmission errors.

In order to limit the propagation and the effect of these errors it isnecessary, in general, to partition the information into blocks byinserting synchronization words between the blocks in order to safeguardthe synchronization of the receivers in the case of an error. Thisprotection can be obtained by three different methods. A first methodconsists in ignoring the type of information which is transmitted and inplacing a synchronization word after a specific number N of bits used inthe transmission. The interesting point in this method is that it fixesthe lengths of the blocks and precisely determines the location of thesynchronization words. Under these conditions, it is possible to useauto-corrector error codes to protect preferentially the synchronizationwords. A second method consists in placing a synchronization word afterN encoded events. Under these conditions, the location of thesynchronization words is not known. However, there exists a possibilityof detecting errors since there must always exist a block separationword after N events. Finally, the third method consists in carrying outan encoding in which the number of encoded events and the number of bitsper block are not necessarily known. In these last two cases, it isparamount to protect, or at least to recognize, the synchronizationwords in order to locate effectively the disturbances caused by theerrors. However, the existing code correctors do not enable thefavouring of the protection of the synchronization words when thelocations of the latter are not known.

Hence, the aim of the invention is to palliate this disadvantage.

To this end, the subject of the invention is an encoding method adaptedto the recognition of synchronization words in sequences of variablelength encoded words, each word code corresponding to a specific messageor event from a sequence of messages each assigned with a probability ofoccurrence, of the type consisting in the construction of adichotomizing encoding tree comprising a specific number of leavesdistributed over priority levels classifed in a decreasing order onproceeding from the root of the tree towards the leaves, the number ofleaves being equal to the number of events to be encoded, characterizedin that it consists in placing the leaves adjacent to one another oneach level, the leaves corresponding to the most probable events beingplaced at a single right or left end on each level, and in prohibiting,in the encoded words, the sequences of Smax bits of equal value, zero orone.

The invention will be better understood and other characteristics willemerge with the aid of the description which follows with reference tothe attached figures which show:

FIGS. 1A, 1B and 1C, synchronization words interposed in bit sequences;

FIG. 2, an illustration of a generation of a left complete encoding treein the encoding algorithm according to the invention;

FIG. 3, an example to illustrate a left incomplete tree structure;

FIG. 4, an example of the obtainment of a left complete encoding treecomprising a maximum number of zeros equal to 2;

FIG. 5, an illustration of the conventional encoding method of Huffman;

FIG. 6, an illustration of the encoding method implemented by theinvention;

FIG. 7, an illustration of the encoding method implemented by theinvention obtained by limiting the depth of the encoding;

FIG. 8, a flow diagram to illustrate the encoding method according tothe invention;

FIGS. 9A and 9B, an example of an implementation of the previousencoding method to limit the length of the synchronization words onplacing prohibited leaves.

The conventional method used to locate a synchronization word in asequence of binary words consists in intercomparing the weight ofseveral successive bit sequences, each sequence comprising a specificnumber of bits. As this weight is defined on a bit sequence by the sumof all the "1" bits of the sequence, it is clear that on giving, forexample, a weight equal to "0" to the synchronization word, this wordbecomes perfectly identifiable among the other sequences which haveweights greater than "0".

Thus, in the bit sequences shown in FIG. 1A the six "0", bit sequence,labelled MS, and which is interposed between two sequences labelled Ccomprising several "1" bits, is perfectly recognizable since its weightis zero and the weights of the sequences to the left and to the right ofthe word MS are respectively three and four. In contrast, if an errorintervenes for example in the starred location of the word C to theleft, the word MS is recognized, however, with a location error likethat appearing in FIG. 1B. The same is true each time that the word Cfinishes with zeros. These problems can however be solved on adding, asis shown in FIG. 1C, a "1" bit at the end, for example, of thesynchronization word MS. Moreover, the solution generalizes for anynumber of T errors. In this case, the location of the synchronizationword can be made certain on condition that the word contains at leastT+1 "1" bits. In fact, it may be noted that, without taking account ofthe location of the T+1 "1" bits in the synchronization word nor of itslength, if T "1" bits are erased in the synchronization word, theposition of the word MS is maintained even if the latter is preceded by"0", by virtue of the presence of the remaining "1". The process ofdetection of the synchronization word for MS results from theconvolution product of sequences Q and MS. Considering, for example, theZ transforms of the synchronization word MS and of the sequence Q of theencoded words such as ##EQU1## with Q_(K) =MS_(K) for K=0, 1, . . . S-1the convolution product of Q(Z) and of MS(Z) is expressed by therelationship ##EQU2## where Pj is the Hamming distance D(Ej.MS) betweenthe sequence Ej and the synchronization word MS, this distance beingable to be measured by an "exclusive OR" operation between any bitsequence situated to the left or to the right of the synchronizationword and the synchronization word itself. The synchronization word MSmay then be identified in the presence of T errors only if the Hammingdistance Pj previously defined is greater than T+1 for j different from0. It is sufficient to interpose "1" bits in the synchronization word MSso as to satisfy the equation Pj greater than or equal to T+1 and jsmaller than 0.

However, when the number of the events to be encoded is large theHuffman encoding method leads to the use of large lengthssynchronization words, difficult to find since numerous code wordconcatenations have to be observed. In order to simplify the search forthe synchronization word, a first solution can consist in prohibiting 0or 1 sequences in the Huffman encoding tree. But the cost in flow-ratebrought about by this solution can be very great since the constraintintervenes even on the most probable event.

The solution implemented by the invention consists in defining analgorithm enabling facilitation of the choice of the synchronizationwords, enabling unconstrained generation of left complete orrespectively right complete trees as is shown in FIG. 2. A left completetree being a tree in which all the leaves of each level are adjacent,that is to say, in which there exists no intermediate node to separatethe leaves on a single level. In FIGS. 2 to 3 the leaves (F1 . . . F4)are represented by circles and the intermediate nodes by points.

In order to limit the length of the synchronization word, according tothe invention sequences of more than Smax "0"s (respectively Smax "1"s)are prohibited in the code words. For example in FIG. 4, more than twoconsecutive 0's (sic) are prohibited in the encoding tree and three "0"sare reserved for the synchronization word. In this way all the binarysequences which finish with Smax+1 "0"s (respectively Smax+1 "1"s) leadto prohibited leaves. An example of a prohibited leaf figures in FIG. 4through a triangle. The search for the length to be given to thesynchronization word MS from the preceding algorithm must take accounton the one hand, of the fact that in order to make certain of thelocation of a word MS in the presence of T or more errors, the word mustcontain T+1 "1" bits, and on the other hand, of the result PK of the"exclusive OR" operation defined by equation (1), carried out on anysequence of length S with the synchronization word MS. In fact, if noerror is evident, the result PK is null and the synchronization word isrecognized locationwise without error. However, in the presence of Verrors which are less than or equal to T, the result PK is less than orequal to T and then MS is recognized locationwise in the presence of Verrors. Consequently, the minimum Hamming distance PK (Pmin) of thesequence of (sic) C relative to MS must be at least equal to T+1 in thepresence of T errors. If a synchronization word of S bits tolerates thepresence of T errors at most, there will be at most T errors on anysequence of S bits and at worst T "1" bits will be erased from thesequence C. The combined contribution on P is hence, for MS=0

    P=Pmin+T+D=2T+1+D

where D is the Hamming distance.

As the synchronization word in fact comprises T+1 "1" bits, if at worstT+1 "1" bits of any sequence coincide with T+1 "1" bits of thesynchronization word, the contribution of these bits to the result PK isnull. In general, if T "1" bits of any sequence belonging to C coincidewith at most B of the T+1 "1" bits of the synchronization word, thecontribution of these bits is null. These considerations enable thedefinition of the weight P which has to be given to a sequence C oflength S in order to obtain protection against T errors. This weight isdefined through the equation

    P=B+T+1+T                                                  (2)

B being less than or equal to T+1. The maximum weight of any codesequence other than MS is therefore equal to 3T+2. Consequently, underthe assumption that it can be taken as certain that a weight of "1" willbe found every (Smax+1) bits in the encoded words, the value of Srepresenting the total length of the synchronization word satisfies theequation

    S=P.(Smax+1)                                               (3)

or again

    S=(3T+2).(Smax+1)                                          (4)

To summarize, the synchronization word MS must have a length S equal to(3T+2)(Smax+1) and comprise T+1 "1" bits.

The algorithms implemented by the invention for generating the leftcomplete, respectively right complete trees, are borrowed from theconventional method of Huffman. According to the method of Huffman, eachset of N events to be encoded is encoded in a set of N binary words ormessages the latter being assigned with a numbering system such that thefirst encoded message always corresponds to the most probable event andthat the last message N corresponds to the event having the lowestprobability of appearance. The messages 1 to N are thus classified inpriority order such that

    P(1)≧P(2)≧P(3) . . . ≧a P(N)          (5)

with the condition that ##EQU3##

In these algorithms the encoding tree possesses a binary structure whichis either empty, or formed by a root, or by a binary tree called a leftsub-tree (L.S.T) or by a binary tree called a right sub-tree (R.S.T).

By convention, an empty tree is the same as a leaf. A left sub-tree isreached by traversing a branch assigned with 0 and a right sub-tree isreached by traversing a branch assigned with 1. This structure enablesrepresentation of the N messages by a binary tree. The code is read byproceeding from the root and by traversing the branches of the tree upuntil the moment when a leaf is encountered, each leaf representing oneevent from the N events. The succession of the bits assigned to thebranches which enables access to a leaf thus forms the correspondingevent code. In this way, if the length L(I) of a code word I, where Ibelongs to the set of events from 1 to N, is determined by the number ofbits contained in the latter, the average length of the code is obtainedfrom the equation ##EQU4## and the inequality

    L(1)≦L(2)≦L(3) . . . ≦L(N)            (8)

must be satisfied.

The conventional method of Huffman consists in encoding events fromleaves in order to return towards the root of the tree. According to theHuffman algorithm, the N probabilities of the messages or leaves areclassified in an increasing order and the two lowest probabilities areset aside in order to form the root of a tree carrying the sum of thesetwo probabilities. The N-1 remaining probabilities are then reordered inan increasing order, then the preceding operations are begun again upuntil the moment when the number of probabilities of the sort which isbeing carried out becomes null. Thus, for example, the encoding ofmessages A', B', C', D', E', F', G', H', I', J', K', L', M' respectivelyassigned the following probabilities of occurrence 0.2; 0.18; 0.1; 0.1;0.1; 0.06; 0.06; 0.04; 0.04; 0.04; 0.04; 0.03 and 0.01 and leaf numerals(1, 2, . . . 13) gives, on using the previously described algorithm, thetree which is shown in FIG. 5.

However, since the encoding is determined by a movement from the leavestowards the roots, the depth of the tree, which is also the depth whichcorresponds to the number of bits of the least probable word, is notknown until the latter is constructed. This fact may be prejudicial inobtaining a low complexity suited to encoding as well as decoding.

The method according to the invention palliates this disadvantagethrough the algorithms described hereafter which enable the encoding ofthe events, not by traversing the encoding tree from the leaves towardsthe roots, but on the contrary, by proceeding from the root in order todescend towards the leaves.

A first method called "TOP-DOWN" consists in constructing a left(respectively right) complete tree whilst each time checking that theleaves of messages 1 to K-1 are placed in the tree, in placing the leafof the message K as far as possible to the right (respectively to theleft) at the depth B under consideration, as is shown in FIG. 6, and thenumber M of leaves or of roots of the tree at the depth B+1 isdetermined through the equation ##EQU5##

As in the Huffman algorithm the leaves or tree roots K+1 to K+N carry aprobability P(I), insofar as a leaf is concerned, or a sum ofprobabilities insofar as a tree root is concerned. The act of trying toconstruct a left complete tree comes down to carrying out a sortsimulation by increasing order of the probabilities at the depth B+1.

Since the probabilities are assumed to be sorted if the relationship##EQU6## is satisfied, then the leaf F(K) may be assumed to be placed inthe correct position.

In the opposite case, if the relationship: ##EQU7## is satisfied then,it is necessary to compare the probability P(K) with the sum of theprobabilities P(M+K-1) and P(M+K-2) of the leaves K+M-1 and K+M+2.

If P(M+K-1)+P(M+K-2) is less than the probability P(K), the leaf F(K) isnot in place and the length of the code of the message K must beincremented by 1. By contrast, if P(M+K-1)+P(M+k-2) is greater than orequal to P(K), the unique solution consists in constructing two trees, afirst tree in which the leaf F(K) is placed at this position and asecond tree on incrementing by 1 the length of the code word K.

The subsequent operations then consist, if the leaf F(K) is assumed tobe placed in the correct position, in attempting to place the leafF(K+1) at the same depth B as the leaf F(K). However if the length ofthe code word K is incremented by 1, the previous algorithm is executedonce again up until the moment when the leaf F(K) is actually placed. Inthe case when two trees are to be constructed, the tree in which theleaf is assumed to be placed is completed and the conditions necessaryfor the continuation of the construction of the other tree are stackedup.

A corresponding program enabling the execution of the previouslydescribed algorithm is as follows.

    __________________________________________________________________________    (INITIALIZATION FOR THE FIRST LEAF F1 OF THE FIRST TREE)                      __________________________________________________________________________    WHILE ALL THE TREES HAVE NOT BEEN CONSTRUCTED DO:                             WHILE THE CURRENTLY PROCESSED TREE IS NOT COMPLETED                           DO                                                                            IF I ≧ N                                                               THEN COMPLETE THE CURRENTLY PROCESSED TREE                                     ##STR1##                                                                              THEN  PLACE THE LEAF FK (INITIALIZATION)                                            PHASE TAKEN INTO ACCOUNT (?))                                           ELSE  IF P(K) ≧ P(I-1) + P(I-2)                                              THEN   (CONSTRUCT TWO TREES)                                                       LEAF FK PLACED                                                                LENGTH OF MESSAGE K                                                             INCREASED BY I (sic)                                                          (CONDITIONS STACKED)                                                   ELSE   INCREASE THE LENGTH OF THE                                                    MESSAGE K BY 1                                                                (END OF INITIALIZATION PHASE)                                          ENDIF.                                                                      ENDIF;                                                                    ENDIF;                                                                        (THE PROCESSING CONTINUES WHILE THE TREE IS NOT                               COMPLETE)                                                            DONE:                                                                         THE BEST TREE IS RETAINED, AND THE TREE CORRESPOND-                           ING TO THE LAST CONDITIONS STACKED IS CONSTRUCTED.                            IF THE STACK IS EMPTY, FINISH                                                 DONE;                                                                         END.                                                                          __________________________________________________________________________

The preceding algorithm can be perfected by limiting the encoding depth.In fact, if Pmax denotes the maximum authorized encoding depth and ifthe number of events N is less than or equal to 2^(Pmax) it is possibleto find the optimum tree which possesses the depth Pmax. In order toobtain this tree, it is sufficient to insist that the message numeral Khas a length incremented by 1 if the number of leaves possible at thelevel Pmax is less than the number of messages remaining to be placedN-K, as the variable I=M+K is less than the number N of leaves to beplaced, the preceding constraint is the tightest. This method isinteresting since it corresponds to the minimum constraint necessary toa strict limitation of the depth and it enables minimization of the lossin the average flow-rate. The number of leaves possible at the depthPmax is then given by the equation: ##EQU8##

A corresponding program whose steps 1 to 18 are illustrated by the flowdiagram in FIG. 8 is as follows:

    __________________________________________________________________________    (INITIALIZATION FOR THE FIRST LEAF F1 OF THE FIRST TREE)                      __________________________________________________________________________    WHILE ALL THE TREES HAVE NOT BEEN CONSTRUCTED DO:                             WHILE THE TREE IS NOT COMPLETED DO                                            IF I ≧ N                                                               THEN COMPLETE THE CURRENTLY PROCESSED TREE                                    ELSE      IF N-K ≦ FPMAX                                                          ##STR2##                                                                          THEN -                                                                              PLACE THE LEAF FK                                                             (INITIALIZATION PHASE TAKEN                                                   INTO ACCOUNT (?))                                                       MODIFICATION OF FPMAX.                                                        ELSE IF P(K) > P(I-1) + (P(I-2) (sic)                                             THEN (CONSTRUCT TWO TREES)                                                    LEAF FK PLACED                                                                MODIFICATION OF FPMAX.                                                        LENGTH OF THE MESSAGE K                                                       INCREASED BY 1 (CONDITIONS                                                    STACKED).                                                                     ELSE INCREASE THE LENGTH OF THE                                                 MESSAGE K BY 1                                                                (END OF INITIALIZATION                                                        PHASE).                                                                       MODIFICATION OF FPMAX                                                   ENDIF;                                                                   ENDIF.                                                                        ELSE IF N-K > FPMAX                                                                IT IS OBLIGATORY TO INCREASE THE                                              LENGTH OF THE CODE WORD K BY 1. (END                                          OF INITIALIZATION PHASE)                                                      MODIFICATION OF FPMAX.                                                ENDIF;                                                                        (THE PROCESSING CONTINUES WHILE THE TREE IS NOT                               COMPLETED)                                                             DONE;                                                                         THE BEST TREE IS RETAINED, AND THE TREE CORRESPOND-                           ING TO THE LAST CONDITIONS STACKED IS CONSTRUCTED.                            IF THE STACK IS EMPTY, FINISH                                                 DONE;                                                                         END;                                                                          __________________________________________________________________________

The satisfaction of the constraints imposed by the search for thesynchronization word, namely that sequences of more than max "0"s in atree are prohibited and that prohibited leaves are inserted in the treeadopted, can be followed with the aid of a particular representationaccording to a table with two dimensions in which the first verticaldimension represents the levels in the tree and the second horizontaldimension characterizes the distribution of the intermediatenodes-leaves and prohibited leaves.

As shown in FIGS. 9A and 9B a positive number characterizes a number ofleaves placed side by side in the tree, a negative number characterizesa number of intermediate nodes placed side by side and therepresentation of a "0" characterizes a prohibited leaf. The lastelement of each level is a pointer which indicates for all time theaddress of the last element placed. This pointer belongs to the stackedconditions. It enables, on defoliation of an incomplete tree, thecontinuation of the calculations at the useful leaf. If Si denotes asequence of I"0"s (respectively I"1"s) less than or equal to Smax, inorder to construct the condition on the sequences of the "0"s it isnecessary to know for all time during the execution of the algorithm,the set of the values of the sequences SI at the current level B and atthe subsequent level B+1. The sequences SI are sequences of "0"s onwhich it is attempted to place the leaves, the intermediate nodes, theprohibited leaves. Denoting by SIK the sequence of "0"s of the tree inwhich it is attempted to place the current leaf of numeral K, and whilstconforming to the depth limitation, it is necessary to calculate for alltime the variations of FPmax as a function of SIK, LK (length of thecurrent code word) and Pmax. Two functions are then used. The firstfunction FP(SIK,LK,Pmax) calculates the modification in the possiblenumber of leaves at the level Pmax when the leaf K is assumed to beplaced, the leaves 1 to K-1 being already placed in the tree. The secondfunction FNP(SIK,LK,Pmax) enables calculation of the modification of thepossible number of leaves at the level Pmax, the leaves 1 to K-1 beingalready placed in the tree, the leaf K is assumed to be placed seeingits length incremented by 1. The variations in the number of leaves atthe level Pmax are calculated only once if necessary, the correspondingresults are used numerous times in the course of the operation of thealgorithm. When a code word numeral K sees its length incremented by 1the prohibited leaves of the current level B can then no longer beeliminated and this contributes to the loss in the average length of thecode but may nevertheless serve for the detection of errors. Bycontrast, the possible prohibited leaves at the level B+1 may beeliminated. In fact, if having placed the leaf numeral K-1, it isattempted to place the leaf numeral K at the same level on the sequenceSmax of 0s a prohibited leaf disappears. In the operation of thealgorithm all the possible prohibited leaves must be considered.

I claim:
 1. Encoding method adapted to the recognition ofsynchronization words in sequences of variable length encoded words,each encoded word corresponding to a specific message or event from asequence of messages, each message assigned with a probability ofoccurrence, wherein a dichotomizing encoding tree comprising a specificnumber of leaves distributed over priority levels n₁ . . . n₄ classifiedin a decreasing order on proceeding from the root of the tree towardsthe leaves is constructed, the number of leaves being equal to thenumber of events to be encoded, said method comprising:placing theleaves adjacent to one another on each level, the leaves correspondingto the most probable events being placed at a single right or left endon each level, and prohibiting, in the encoded words, sequences of 1'sor 0's greater than a predetermined number, Smax, of bits.
 2. Methodaccording to claim 1, characterized in that the length of thesynchronization word is equal to (3T+2) times the number of bits Smax+1,T denoting the maximum number of tolerated bit transmission errors. 3.Method according to either of claims 1 and 2, wherein to place a leafF(K) at a definite position of a priority level (n_(i)) in the encodingtree, said method comprises:comparing the probability P(K) of the leafF(K) with the sum S of the probabilities of the leaves remaining to beplaced, where ##EQU9## placing the leaf F(K) at said definite positionif the probability P(K) of leaf F(K) is greater than or equal to S, andcomparing in the contrary case the probability P(K) of leaf F(K) withthe sum P(M+K-1)+P(M+K-2) of the probabilities of the leaves F(M+K-1)and F(M+K-2) situated at the lower priority level to increment by 1 thelength of the message K, or to maintain the leaf F(K) at this position.4. Method according to claim 3, comprising:reading the event codes byproceeding from the root and by traversing the branches of the tree upuntil the moment a leaf is encountered.